ConTextual Masked Auto-Encoder for Dense Passage Retrieval
نویسندگان
چکیده
Dense passage retrieval aims to retrieve the relevant passages of a query from large corpus based on dense representations (i.e., vectors) and passages. Recent studies have explored improving pre-trained language models boost performance. This paper proposes CoT-MAE (ConTextual Masked Auto-Encoder), simple yet effective generative pre-training method for retrieval. employs an asymmetric encoder-decoder architecture that learns compress sentence semantics into vector through self-supervised context-supervised masked auto-encoding. Precisely, auto-encoding model tokens inside text span, semantical correlation between spans. We conduct experiments large-scale benchmarks show considerable improvements over strong baselines, demonstrating high efficiency CoT-MAE. Our code is available at https://github.com/caskcsg/ir/tree/main/cotmae.
منابع مشابه
Structured Auto-Encoder
In this work, we present a technique that learns discriminative audio features for Music Information Retrieval (MIR). The novelty of the proposed technique is to design auto-encoders that make use of data structures to learn enhanced sparse data representations. The data structure is borrowed from the Manifold Learning field, that is data are supposed to be sampled from smooth manifolds, which ...
متن کاملAuto-JacoBin: Auto-encoder Jacobian Binary Hashing
Binary codes can be used to speed up nearest neighbor search tasks in large scale data sets as they are efficient for both storage and retrieval. In this paper, we propose a robust auto-encoder model that preserves the geometric relationships of high-dimensional data sets in Hamming space. This is done by considering a noise-removing function in a region surrounding the manifold where the train...
متن کاملAuto-encoder Based Data Clustering
Linear or non-linear data transformations are widely used processing techniques in clustering. Usually, they are beneficial to enhancing data representation. However, if data have a complex structure, these techniques would be unsatisfying for clustering. In this paper, based on the auto-encoder network, which can learn a highly non-linear mapping function, we propose a new clustering method. V...
متن کاملContractive De-noising Auto-Encoder
Auto-encoder is a special kind of neural network based on reconstruction. De-noising auto-encoder (DAE) is an improved auto-encoder which is robust to the input by corrupting the original data first and then reconstructing the original input by minimizing the reconstruction error function. And contractive auto-encoder (CAE) is another kind of improved auto-encoder to learn robust feature by int...
متن کاملDAP: LSTM-CRF Auto-encoder
The LSTM-CRF is a hybrid graphical model which achieves state-of-the-art performance in supervised sequence labeling tasks. Collecting labeled data consumes lots of human resources and time. Thus, we want to improve the performance of LSTM-CRF by semi-supervised learning. Typically, people use pre-trained word representation to initialize models embedding layer from unlabeled data. However, the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i4.25598